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1.  Introduction 

The  change-point  problem  can  be  considered  one  of  the  central  problems  of 
statistical  inference,  linking  together  statistical  control  theory,  theory  of 
estimation  and  testing  hypotheses,  classical  and  Bayesian  approaches,  fixed 
sample  and  sequential  procedures.  It  is  very  often  the  case  that  observations 
are  taken  sequentially  over  time,  or  can  be  intrinsically  ordered  in  some  other 
fashion.  The  basic  question  is,  therefore,  whether  the  observations  represent 
independent  and  identically  disbtibuted  random  variables,  or  whether  at  least 
one  change  in  the  distribution  law  has  taken  place. 

This  is  the  fundamental  problem  in  the  statistical  control  theory,  testing 
the  stationarity  of  stochastic  processes,  estimation  of  the  current  position  of 
a  time- series,  etc.  Accordingly,  a  survey  of  all  the  major  developments  in 
statistical  theory  and  methodology  connected  with  the  very  general  outlook  of 
the  change-point  problem,  would  require  review  of  the  field  of  statistical 
quality  control,  the  switching  regression  problems,  inventory  and  queueing 
control,  etc.  The  present  review  paper  is  therefore  focused  on  methods  developed 
during  the  last  two  decades  for  the  estimation  of  the  current  position  of  the 
mean  function  of  a  sequence  of  random  variables  (or  of  a  stochastic  process); 
testing  the  null  hypothesis  of  no  change  among  given  n  observations,  against 
the  alternative  of  at  most  one  change;  the  estimation  of  the  location  of  the 
change-point(s)  and  some  sequential  detection  procedures.  The  present  paper 
is  composed  accordingly  of  five  major  sections.  Section  2  is  devoted  to  the 
problem  of  estimating  the  current  position  of  a  sequence  of  random  variables, 
specifically  discussing  the  problem  with  respect  to  possible  changes  of  the 
means  of  independent  normally  distributed  random  variables.  We  review  the 
studies  on  this  problem  of  Barnard  (63,  Chernoff  and  Zacks  (14],  Mustafi  [45] 


and  others.  Section  3  is  devoted  to  the  testing  problem  in  a  fixed  sample.  More 

specifically,  we  consider  a  sample  of  n  independent  random  variables.  The  null 

hypothesis  is  H  :  F. (x)  ■  ...  *  F  (x)  ,  against  the  alternative, 

0  i  n 

H, :  F, (x)  -  ...  =  F  (x)  ;  F  , . (x)  -  ...  ■  F  (x)  ,  where  t"l,2, . . . ,n-l  designates 
±  i  t  tti  n 

a  possible  unknown  change  point.  The  studies  of  Chemoff  and  Zacks  [14],  Kander 
and  Zacks  [36],  Gardner  [21],  Bhattacharya  and  Johnson  [9],  Sen  and  Srivastava  [57] 
and  others  are  discussed.  These  studies  develop  test  statistics  in  parametric  and 
non-par ame trie,  classical  and  Bayesian  frameworks.  Section  4  presents  Bayesian 
and  maximum  likelihood  estimation  of  the  location  of  the  shift  points.  The  Bayesian 
approach  is  based  on  modeling  the  prior  distribution  of  the  unknown  parameters, 
adopting  a  loss  function  and  deriving  the  estimator  which  minimizes  the  posterior 
risk.  This  approach  is  demonstrated  with  an  example  of  a  shift  in  the  mean  of  a 
normal  sequence.  The  estimators  obtained  are  generally  non-linear  complicated 
functions  of  the  random  variables.  From  the  Bayesian  point  of  view  these  estimators 
are  optimal.  If  we  ask,  however,  classical  questions  concerning  the  asymptotic 
behavior  of  such  estimators,  or  their  sampling  distributions  under  repetitive 
sampling,  the  analytical  problems  become  very  difficult  and  untractable.  The 
classical  efficiency  of  such  estimators  is  often  estimated  in  some  special  cases 
by  extensive  simulations.  The  maximum  likelihood  estimation  of  the  location 
parameter  of  the  change  point  is  an  attractive  alternative  to  the  Bayes  estimators. 
Hinkley  [26-30]  investigated  the  asymptotic  behavior  of  these  estimators.  The 
derivation  of  the  asymptotic  distributions  of  these  estimators  is  very  complicated. 
We  present  in  Section  4  Hinkley’ s  approach  for  the  determination  of  the  sampling 
distributions  of  the  maximum  likelihood  estimators.  Section  5  is  devoted  to 
sequential  detection  procedures.  We  present  the  basic  Bayesian  and  classical 
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results  in  this  area.  The  studies  of  Shiryaev  [60,61],  Bather  [7,8],  Lorden  [43] 
and  Zacks  and  Barzily  [69]  are  discussed  with  some  details.  The  study  of  Lorden  [^3] 
is  especially  significant  in  proving  that  Page's  CUSUM  procedures  [47-49]  are 
asymptotically  minimax. 

The  important  area  of  switching  regressions  have  not  been  reviewed  here  in  any 
details.  The  relevance  of  the  switching  regression  studies  to  the  change-point 
problem  is  obvious.  Regression  relationship  may  change  at  unknown  epochs  (change 
points),  resulting  in  different  regression  regimes  that  should  be  detected  and 
identified.  The  reader  is  referred  to  the  important  studies  of  Quandt  [51,52], 

Inaelman  and  Arsenal  [35],  Ferreira  [19],  Maronna  and  Yohai  [44]  and  others. 

An  annotated  bibliography  on  the  change-point  problem  was  published  recently 
by  Shaban  [59].  The  reader  can  find  there  additional  references  to  the  seventy-one 
references  given  in  the  last  section  of  the  present  paper. 

2.  Estimating  the  Current  Position  of  a  Process 

G.  Barnard,  in  his  celebrated  1959  paper  [6]  on  control  charts  and  stochastic 

processes,  suggested  to  consider  the  problem  of  estimating  the  current  position  of 

a  process  as  a  tool  of  statistical  control.  The  problem  of  estimating  the  current 

mean  of  a  process  requires  modeling  of  the  possible  change  mechanism  of  the  mean 

function.  In  the  context  of  statistical  control  problems  the  mean,  as  function 

of  time,  is  generally  assumed  to  commence  at  an  initial  point,  Pq  ,  known  or 

unknown,  and  then  change  abruptly  at  unknown  epochs, 

Let  X^.X^, ...,Xn  be  a  sequence  of  random  variables.  We  denote  by  p  (i"l, . . . ,n) 

a  location  parameter  of  the  distribution  of  X^  .  If  the  random  variables  are  normally 

distributed  then  p^  is  the  expected  value  (mean)  of  X^  . 

Generally,  neither  the  change  points  ti»t2****  nor  c^e  size  changes  are 

known,  and  the  problem  of  estimating  p  ,  after  observing  X, ,X~,...X  ,  might 

n  1  Z  n 

have  no  better  solution  than  the  trivial  estimator  p^X^,  unless  the  phenomenon 
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studied  allows  proper  modeling.  In  the  present  paper  we  discuss  the  models  adopted 
by  Barnard  C6]  and  by  Chernoff  and  Zacks  [14],  and  the  estimators  of  the  current 
position  which  they  derived  from  these  models.  The  related  study  of  Mustafi  C 45 D 
is  also  presented.  As  will  be  shown,  time-series  procedures  of  exponential 
smoothing  are  strongly  related  to  linear  unbiased  estimator  studied  in  [6]  and  [14]. 

2.1  Barnard's  Estimator  of  p 


Consider  the  given  sequence  of  observations  in  a  reversed  time  manner,  i.e., 

Xn,Xn  Barnard  adopted  the  basic  assumption  that  the  corresponding 

random  variables  are  independent  and  normally  distributed,  with  the  same  known 
2 

variance  (ox«l)  .  Suppose  that  the  observations  are  taken  at  regular  time  intervals 

of  1  unit.  Barnard's  model  assumes  that  the  epochs  of  change  ti»t2**‘*  follow  a 

Poisson  process  with  intensity  A  (per  time  unit) .  At  each  of  the  random  change 

epochs  the  size  of  the  shift  in  the  mean  is  a  random  variable,  6  , 

2 

following  a  normal  distribution,  N(0,o  )  .  Moreover,  ®i»®2»"*  are  mutually 

independent,  and  the  sequence  {5}  is  independent  of  {t}  .  Thus,  if 

designate  the  number  of  change  epochs  between,  X^  and  xn_^»^n_^  and  xn_2  >  then 

^1,^2,‘‘‘  a  se(luence  °f  i.i.d.  (independent  and  identically  distributed)  random 

variables  having  a  Poisson  distribution,  P(A)  .  The  model  is  X  -  p  +  E 

’  n  n  n 


(2.1) 


where  S 


x  i  -  v  +  ys,  +  e  . 

n-1  n  ,  L.k  n-1 
k»l 


J. 

.  ”  V  5.  and  E,,...,E  are  i.i.d. 
1  jti  J  In 


,  i=l , . . . ,  n— 1 


N(0,1)  .  Assuming  that  A  and  o^ 


are  known,  Barnard  provided  the  general  form  of  the  minimum  mean  square  error 

(MSE)  linear  estimator  of  p  ,  and 

n 
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that  of  its  (formal)  Bayes  estimator  (which  is  actually  called  by  Barnard  "the 
mean- likelihood  estimator").  It  is  shown  that  the  minimum  MSE  linear  estimator, 
is  the  exponential  smooting  estimator 


(2.2) 


B  X  +  A  u  . 
n  n-1 


The  (formal)  Bayes  estimator  of  is  of  the  form 


(2.3) 


1  V  X 

°  I  "OJO  ^  A 

1  “  -n  1  y-1  1 

Jn  .n  _n  tn 


where  1^  is  an  n-dimensional  vector  of  l's;  j  *  (J^»...,J  ^)  is  a  particular 

realization  of  J,  ,...,J  X  =  (X  ,X  X, );  ir(j  IX  )  its  posterior 

i  n-l  _n  n  n-1  1  _n  _n 

probability,  and  the  covariance  matrix  of  Xq  corresponding  to  a  given 

realization  1 
_n 

2.2  Chernoff  and  Zacks '  Model  and  BLUE  of  y 
_ n 

Chernoff  and  Zacks  assumed  a  model  different  from  that  of  Barnard,  although 
there  are  general  similarities.  According  to  their  model,  if  y^  *  E{Xi>  then 


(2.4) 


"i  "  “i+i  +  Ji  si 


,i“l .... ,n-l 


where  is  a  random  variable  assuming  the  value  1  if  there  is  a  shift  in  the 

mean  between  the  ith  and  (i+l)st  observations,  and  the  value  0  otherwise. 

2 

Furthermore,  • *  * »fin_i  are  N(0,a  ),  J1>...,Jnl  are  i.i.d., 

P[J,=1]  *  p  (i=l,...,n)  .  Let  J  ■  (J  , . . . , J  )  and  6  *  (6.,..., 6  ,)  . 
i  ^  l  n-l  ^  l  n-l 

Chernoff  and  Zacks  showed  that  the  minimum  variance  linear  unbiased  estimator 


(BLUE)  of  y  is 
n 


(2.5) 


n-1 

x„  +  i  «ixi 

_ igl _ _ 

n-1 

1+  Iq 

j  1 1 
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where 


(2.6) 


and 


(2.7) 


f(vi_1-1)/vi_1...vn_2(vn_i-l)  ,  i”2, . . .  ,n-l 

1/vlv2‘ ' ' vn-2(vn-l_1)  *  1-1 

!2+C2p  ,  if  k=l 

2  -1 

2+ct  p-V  ^  »  if  k-2,...,n-l 


following 

table 

we  illustrate 

Table  2.1. 

a 

some  of 

Weights 

2=»1  ,  p=. 

these  weights: 

for  the  BLUE 

,1 

\i 

n\ 

1 

2 

3 

4  5 

2 

.909 

1.000 

3 

.763 

.840 

1.000 

4 

.606 

.666 

.793 

1.000 

5 

.464 

.510 

.735 

.745  1.000 

Notice  that  when  pa  =0  then  £  =1  for  all  i*l,...,n-l  .  In  this  case  p  =x 

1  n  n 

2 

is  the  common  sample  mean.  On  the  other  hand,  when  pd  -*■  00  then  the  weights  5^ 
diminish  to  zero  in  a  geometric  rate,  i.e.  S^O^pa2) 

Accordingly,  as  n  increases,  the  weight  given  to  observations  at  the  beginning 

2 

of  the  sequence  is  close  to  zero.  In  particular,  if  pa  is  large,  it  is  sufficient 
to  base  the  estimator  only  on  the  last  m  observations.  Mustafi  [45]  investigated 
the  characteristics  of  such  estimators  based  on  the  last  block  of  m  observations. 
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Moreover,  Mustafi  showedthat,  if  the  value  of  c*po  is  unknown,  it  can  be  estimated 
consistently  by 


(2.8) 


2  2 
6S^-2S2 

2  T 

S2-2S1 


where 


sl '  -5T r  X  (Vxi+i>: 


(2.9) 

sl  •  ~zr  %  (xr2>WH!i+2)2 

a 

Let  p  denote  the  UMVU  estimator  of  p  ,  based  on  the  last  m  observations 
n,m  n 

a 

in  which  c  is  replaced  by  its  estimate  c  .  According  to  Mustafi' s  procedure, 
the  first  n-m  observations  are  used  to  estimate  c  by  (2.9),  and  the  estimator 

*  a 

c  of  c  is  substituted  in  (2.6)-(2.7)  to  obtain  the  corresponding  weights  £ 

±,m 

Notice  that  the  estimator  obtained  in  this  manner  is  not  BLUE  anymore.  Furthermore, 
c  might  be  negative  (with  positive  probability).  In  such  a  case,  £  is 

1  yD 

replaced  by  its  positive  part  £*  *  max(0,£.  )  .  Mustafi  established  that 

l,m  i,m 


(i) 

E(pn  }  - 
n,m 

Pn 

,  for  each  n,m 

(ii) 

V{^„  J  * 

n,m 

1  +  o2p(m-l) 

9 

and 

(iii) 

lim  V{p 

9 

n-x» 

where  p  is  the  BLUE  estimator  based  on  the  last  m  observations,  with  known  c  . 
m 


2.3  Chernof f-Zacks  Bayes  Estimators  of  p 
_ n 

2 

Assuming  that  p^  has  a  prior  normal  distribution  N(0,x  )  ,  we  obtain  that 

the  posterior  distribution  of  p  ,  given  X  and  J  *(J, , . . . , J  , )  is  normal, 

n  _n  .n  l  n-1 


with  mean 


(2.10) 


and  variance 


y_(d  )  ■ 

n  ~n 


•  i- 

1 

~n 


M 


r1  k 

r-1  / .  v  . 


(J  )  1 
~n  ~n 


(2.11) 


v{4>  ’  — 


+  1 
~n 


1 11  <4>  i» 


where 


(J  )  ■  I  +  o  J  J' 
~n 


(2.12) 


J.  J.  ...  1  . 

1  2  n-1 


2N”  ’  n-1 

\ 


NJ 


n-1 


Let  p  (j)  be  a  prior  probability  function  of  J 
n  ~  ~n 

function  of  J  ,  given  X  ,  is  then 
~n  ~n 


(2.13) 


p  (j)n(X  |0,I  (j)) 

P  (J  IX)  -  — ^ ■  r«~ - 

I  P_(j)n(X |0, t  (j)) 

11  fwU  ^  T 

{j} 


1*  1  2  1 

where  I  (j)  ■  Y(j)  +  t  1  1  ,  and  n(x  |0,Y)  is  the 

I  ~  T  **  ~n~n  ~n  ^ 

at  \  ’  w*t*1  mean  vector  0  and  covariance  matrix  |  . 


estimator  of  u  is 
n 


(2.14) 


“iB)  '  l 


and 


The  posterior  probability 


multivariate  normal  p.d.f. 
Finally,  the  Bayes 
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This  estimator  is  obviously  non-linear,  due  to  the  non-linear  structure  of  the 

posterior  probabilities.  The  structure  of  the  Bayes  estimator  (2.14)  is  the  same 

as  that  of  Barnard's  mean-likelihood  estimator  (2.3).  The  problem  with  these 

estimators  is  in  their  degree  of  complexity.  The  sample  space  of  J  consists 

~n 

of  2°  ^  different  points  and  it  is  a  very  difficult  matter  to  choose  a  proper 

prior  distribution.  Even  if  we  ascribe,  a-priori,  each  of  these  2n  *  points 

equal  probabilities,  we  have  to  make  a  significantly  large  number  of  calculations 

to  determine  '  .  In  many  problems  of  interest  it  is  unreasonable  to  assume 
n 

that  the  mean  is  likely  to  shift  between  any  two  observations.  If  it  is  reasonable 
to  assume  that  the  number  of  possible  shifts  among  a  relatively  small  number  of 
observations  is  at  most  one,  the  computations  will  be  significantly  simplified. 

The  Bayes  estimator  based  on  the  assumption  of  at  most  one  change  (AMOC)  is 
presented  in  the  next  section. 


2.4  The  AMOC-Bayes  Estimator  of  p^ 

According  to  the  AMOC  model  we  assume  that  among  the  given  n  observations 

there  is  at  most  one  change.  Let  x  be  an  integer  valued  parameter  assuming 

the  values  0,1,..., n-1  .  If  x"l  ,  the  first  t  random  variables  have  the  same 

mean  p  +6  and  the  last  n-t  random  variables  have  the  mean  p  .  If  x»0 
n  n 

there  was  no  shift  in  the  mean  among  the  n  observations.  Let  ir(t)  be  the 
prior  probability  of  {x*t}  .  The  conditional  Bayes  estimator,  for  a  given  value 
of  t  ,  is 


(2.15) 


Pn(t) 


nX  +  o  t(n-t)  X  . 
_ n _ _ _ n-t 

2 

n  +  o  t(n-t) 


,  t*0,...,n-l. 


_  -  n  _  n 

where  X  -  -  7  X,  and  X*  _  -  7  X. 

n  n  ^  1  n-t  n-t  ^  j 
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Furthermore,  the  posterior  probability  of  {x*t}  ,  given  X  ,  is 

#vQ 


(2.16) 


where 


(2.17) 


ir(t|X  ) 
~n 


C  a2 t2(n-t)2  (X  -\_t)2  J 

7r(t)expl-r-  - ’ — 2 - 2 -  I 

r  n  +  a  t(n-t)n  \ 

_ V  9 


(n+o^t(n-t))ls  D 


n-1 

D  -  l 


TT(.I) 


n  j-0  (n+o2j (n-j ) y 


G  °2j2c-i)2  1 

“T  7777^7  ) 


The  Bayes  estimator  of  p  in  the  AMOC  model  is  accordingly 

n 


(2.18) 


,  n-1 

P  =  I  *(j  IX)  P  (j) 
n  ^  ~n  n 


2.4.1.  Adaptive  AMOC-Bayes  Estimation 

The  AMOC  procedure  can  be  applied  on  the  last  m  observations  sequentially, 

starting  with  m=2  and  increasing  it  until  a  strong  indication  emerges  that  a 

shift  has  taken  place.  The  procedure  is  then  stopped  and  p  is  estimated 

n 

according  to  (2.18)  on  the  basis  of  the  last  m  observations.  This  process  is 
summarized  in  algorithm: 


Step  0. 

Set 

wp=2  . 

Step  1. 

Set 

Y  -X  .. ,...,Y  -X  . 

1  n-nr+1  m  n 

Step  2. 

Compute  7T ( 1 1 Y  )  ,  t"0,  —  ,m-l  . 

If  TT (0  I Y  )  -  max  n(j  |Y  ) 

ivtl  <*  ,  •  (wHl 

0£j  Sm-1 

go  to  Step  3;  else  go  to  Step  4  . 

Step  3. 

Set 

m  nH-1  and  go  to  Step  1. 

Step  4. 

Let 

* 

k  -  least  j  ■  l,m-l  such  that 

*(j  |Y  )  -  max  ir(t  |Y  )  . 


Step  5.  Apply  estimator  (2.18)  on  the  last  m-k  observations. 
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The  following  numerical  example  illustrates  the  adaptive  estimation  process 
according  to  the  above  algorithm.  Consider  the  following  n*9  observations  on 
independent,  normally  distributed  r.v.'s:  '  ”  2.613,  ■  1.661,  Xg  *  1.814, 

X4  -  1.274,  X5  -  2.616,  Xg  -  -.326,  X?  -  -2.422,  Xg  -  -.119,  Xg  -  -.034  . 

2 

Assume  that  0  “3  and  the  prior  distribution  is 
»m(0)  -  (l-p)m_1 

irm(t)  -  p(l-p)m_t"1  ,  t«l,...,m-l  , 

with  p=.2  .  The  posterior  probabilities  of  the  change  points,  given  the  last 
m  observations,  are  given  in  the  following  table. 

Table  2.2.  Posterior  Probabilities  of  the  Shift  Locations 


t 

m 

0 

1 

2 

3 

4 

2 

.9298 

.0702 

3 

.6804 

.0722 

.2474 

4 

.7844 

.0660 

.0954 

.0542 

5 

.1765 

.0107 

.0088 

.0890 

.7149 

According  to  these  posterior  probabilities  there  is  a  strong  indication  that  a 

shift  took  place  between  the  fifth  and  the  sixth  observation.  The  AMOC  Bayes 

estimator  based  on  the  last  four  observations  is  ^  ■  -.6301  . 

Experience  with  the  application  of  this  method  on  various  data  sets  shows  that  it 

could  be  too  sensitive  as  an  estimator  of  the  location  of  the  shift  points.  Farley 

and  Hinich  [18]  showed  in  a  series  of  simulations  that  the  above  procedure  leads  to 

a  high  proportion  of  indication  of  change  when  there  are  none  (false  alarms) .  This 

2 

problem  can,  however,  be  overcome  by  proper  choice  of  the  parameters  p  and  o 
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2 

a  should  be  at  least  3  or  4  times  the  variance  of  the  random  variables  E^j****^  • 

As  an  estimator  of  the  current  position  the  above  procedure  performs  very  well. 

This  was  also  reported  by  Farley  and  Hinich  in  [18].  We  provide  here  some  numerical 
comparisons  of  the  characteristics  of  the  UMVU,  AMOC- Bayes  and  the  Adaptive  AMOC- 
Bayes  estimators  of  y^  ,  based  on  some  simulation  experiments.  These  results  are 
taken  from  Chernoff  and  Zacks  [14].  In  these  experiments  100  replicas  of  samples 
of  size  n*9  were  simulated  from  normal  distributions,  with  means  y^  and  variance  1. 
In  all  cases  yg=0  .  We  compare  the  means  and  MSE,  over  the  100  replicas,  of  the 
following  estimators: 

y-:  UMVU  with  a2“3,  p«.2 

-  2 

AMOC-Bayes ,  a  m3,  p**.2 

-  2 
y 2 *  Adaptive  AMOC-Bayes,  a  “3,  p*.2 

The  models  of  shifts  in  the  means  are: 

Model  I:  A  random  change  between  every  two  observations,  I.e.  yi  ~  N(0,2) 
(i=l,...,8) 

2  8 

Model  II:  y  -  a  £  J,  n.  , 

1  k-i  K  * 

^l’^2’*'*  are  Bernoulli,  with  p*.l,  o*2 

r’l,t12’*'*  are  N(0,1) 

Model  III:  No  change. 

The  simulation  estimates  are: 
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Table  2. 3.  Simulation  Characteristics  of  Three  Estimators 


Estimates 

Model 

V1 

CM 

<  a 

p3 

I 

-.2718 

-.1866 

-.0827 

Mean 

2.1406 

3.3140 

1.0235 

MSE 

II 

.0847 

.0539 

.0525 

Mean 

.4460 

.4337 

.4135 

MSE 

III 

.0255 

.0027 

-.0122 

Mean 

.3078 

.6112 

.2679 

MSE 

The  above  results  indicate  that  the  Adaptive  AMOC-Bayes  estimator  is  performing 
as  well  or  better  than  the  UMVU  or  the  AMOC-Bayes,  especially  when  the  actual 
process  of  shifts  in  the  means  is  different  from  the  one  assumed  in  the  model. 


3.  Testing  Hypotheses  Concerning  Change  Points 

The  problem  of  testing  hypotheses  concerning  the  existence  of  shift  points 
was  posed  by  Chemoff  and  Zacks  [14]  in  the  following  form. 

Let  X^,...,Xn  be  a  sequence  of  independent  random  variables  having  normal 
distributions  NCe^.l)  ,  i»l,...,n  .  The  hypothesis  of  no  shift  in  the  means, 
versus  the  alternative  of  one  shift  in  a  positive  direction  is 


% 


vs 


“r  8i 


9o  1  6t+i 


n 


0O  +  5  . 


where  x*l,...,n-l  is  an  unknown  index  of  the  shift  point,  6  >  0  is  unknown 
and  the  initial  mean  0^  may  or  may  not  be  known. 

Chemoff  and  Zacks  showed  in  [14]  that  a  Bayes  test  of  versus  ,  for 

6  values  close  to  zero,  is  given  by  the  test  statistic 
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n-l_ 

l  (i+1)  X 

i-1 


,  if  9q  is  known 


(3.1) 


7  (i+l)  (X.-X  )  ,  if  0.  is  unknown, 

i-1  1  n  0 

where  X^  is  the  overage  of  all  the  n  observations.  It  is  interesting  to  see 
that  this  test  statistic  weighs  the  current  observations  (those  with  index  close 
to  n)  more  than  the  initial  ones.  However,  the  weight  is  linear  rather  than 
geometric  (as  in  the  estimation  of  the  current  position).  Since  the  above  test 
statistic  is  a  linear  function  of  normal  random  variables  Tn  is  normally 
distributed  and  it  is  easy  to  obtain  the  critical  value  for  a  size  a  test 
and  the  power  function.  These  functions  are  given  in  the  paper  of  Chemoff  and 
Zacks  [14]  with  some  numerical  illustrations. 

The  above  results  of  Chemoff  and  Zacks  were  later  generalized  by  Kander  and 

Zacks  [36]  to  the  case  of  the  one-parameter  exponential  family,  in  which  the 

density  functions  are  expressed,  in  the  natural  parameter  form  as 

f(x;0)  -  h(x)  exp  (eU(x)  +  ^(0)}  (see  Zacks  [70;  pp.  95]).  Again,  Kander  and 

Zacks  established  that  the  Bayes  test  of  Hq  ,  for  small  values  of  6  when  0q 

is  known,  is  of  the  form  (3.1),  where  X^  are  replaced  by  U(X^)  (i«l,...,n). 

The  exact  determination  of  the  critical  levels  might  require  a  numerical  approach, 

since  the  exact  distribution  of  T  is  not  normal,  if  U(X. )  are  not  normal. 

n  l 

Kander  and  Zacks  showed  how  the  critical  levels  and  the  power  functions  can  be 
determined  exactly,  in  the  binomial  and  the  negative-exponential  cases.  If  the 
samples  are  large,  the  null  distribution  of  Tn  converges  to  a  normal  one, 
according  to  the  Lapunov  version  of  the  Central  Limit  Theorem  (see  Fisz  [20;  pp.  202]). 
Kander  and  Zacks  [36]  provided  numerical  comparisons  of  the  exact  and  asymptotic  power 
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functions  of  ,  in  the  binomial  and  the  negative-exponential  cases. 

It  is  often  the  case  that  the  sample  size  is  not  sufficiently  large  for  the  normal 
approximation  to  yield  results  close  to  the  true  ones.  For  this  reason,  Kander 
and  Zacks  tried  to  approximate  the  exact  distribution  of  by  the  Edgeworth 

expansion 


Y  y 

(3.2)  Fn(Z)  -  $(Z)  -  $(3)(Z)  +  f-  $(4)(Z) 

+  (10y*  /6!)  $(6)(Z) 

l.n 

where  F  (Z)  is  the  exact  distribution  of  the  standardized  test  statistic 
n 

Z  =  (T  -E{T  })/(Var  {T  })**  ;  $(Z)  is  the  standard  normal  c.d.f.;  $'V^(Z)  is 

n  n  n  n 

3/2  2 

the  v-th  derivative  of  $(Z)  and  y  -  p,  /(p_  )  ,  y,  ■  u.  /u,  -3 

i,n  j,n  z )  n  z  j  n  4  j  n  z  y  n 

where  p.  is  the  1-th  central  moment  of  T 
j  ,n  n 

It  was  shown  that  when  the  samples  are  not  large  (n*10)  the  Edgeworth  expansion 
of  the  c.d.f.  of  Zn  ,  under  the  alternative  hypothesis  ,  provides  power 
function  approx/ mat ion  better  than  those  of  the  normal  approximation.  Hsu  [34] 
utilized  the  above  test  for  testing  whether  a  shift  occurred  in  the  variance 
of  a  normal  distribution. 

Gardner  [21]  considered  the  testing  problem  of  versus  for  the 
normally  distributed  random  variables,  but  with  5*0  unknown.  He  showed  that 
the  Bayes  test  statistics,  with  prior  probabilities  nt  ,  t=l,2, . . . ,n-l  ,  is 


(3.3) 


n-1  n-1  _  2 

qn  ■  ",  n  <Vi  -  V1 


9  — *  —  9 

",  (Xn-,  -  V 
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where  X  is  the  mean  of  the  last  n-t  observations  and  X  is  the  mean  of 
n-t  n 

all  n  observations.  Gardner  investigated  the  exact  and  the  asymptotic  distribu¬ 
tions  of  ,  under  the  null  hypothesis  -Hq  and  under  the  alternative  ,  for 

the  case  of  equal  prior  probabilities.  Scaling  Q  ,  so  that  its  expected  value 

n 

2 

will  be  1  for  each  n  ,  by  the  transformation  Y  *  (6n/(n  -1))Q  ,  n*2,3,..., 

n  n 

we  obtain  that,  under  H.  ,  Y  is  distributed  like 

O  n 

n-1  2 

£  A,U  ,  where  1  are  i.i.d.  standard  normal  r.v.'s  and 

k.  K  k  i.  n-  X 

=1 


«-4>  \  *  2~  l~2  ““  0>»/2n)]'2 

u  (n  -l)k 


,  k-1,  . . . ,  n-1 


Thus,  as  n  -*■  °°  ,  the  asymptotic  distribution  of  Y^  ,  under  ,  is  like  that  of 


(3.5) 


*  -  -V  l  -±r  uk 

ir  k=l  k 


The  distribution  of  Y  is  that  of  the  asymptotic  distribution  of  Smirnov's 
2 

statistic  a>n  ,  normalized  to  have  mean  1  .  Smirnov's  statistic  compares  the 

empirical  c.d.f.  of  a  sample  of  continuous  random  variables  to  a  particular 

distribution,  Fq(x)  .  More  specifically,  if  X^  £  ...  5  X^  is  the  order 

statistic,  corresponding  to  n  i.i.d.  random  variables,  and  if  F  (x)  is  the 

n 

corresponding  empirical  c.d.f.,  i.e.. 


n  , 

F^Cx)  ■  £  I{Xm  s  x  <  X(j+l)^  n  »  Smirnov's  statistic  is 


(3.6) 


■  “ST  +  6  l  CFn«r„>  -  F„<*>  +  -fc-3 


0'"(J)y  nv 
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Gardner  refers  the  reader  to  Table  VIII  of  VonMises  [66]  for  the  critical  values 

of  Y  ,  for  large  n  .  Critical  values  c  (a)  ,  for  a-. 10,. 05  and  .01  and 
n  n 

various  values  of  n  ,  can  be  obtained  from  Figure  1  of  Gardner's  paper. 

Gardner  showed  also  that,  under  ,  the  p.d.f.  of  Y  is 

0  n 


(3.7)  fn(y) 


*  J  k-1 


2  -2, -is  ,  i  h;1  -i  -v 

(l+t  a.  )  cos  (ty-—  l  tan  tot,  )dt 
K  2  k-1  k 


2 

where  - -  cos  (kir/2n)  ,  k-l,...,n-l  .  The  integration  of  fQ(y) 

for  the  determination  of  its  (l-a)th  fractile,  c  (a)  »  requires  special  numerical 
techniques.  The  power  function  of  the  test  was  determined  by  Gardner  in  some 
special  cases  by  simulation. 

Sen  and  Srivastava  [56]  discussed  the  statistic 


n-1  n-1 


(3.8) 


I  <  l  x,«>' 


2  L  x  L 
n  i-1  j-i  3 


1  ty  ^  ' 

—  l  (xn-i>‘ 

n  i-1 


for  testing  versus  with  6*0  ,  when  the  initial  mean,  ,  is  known. 

They  showed  that  the  asymptotic  distribution  of  ,  under  Hq  ,  has  the  c.d.f. 


(3.9) 


F(Z)  -  ~  l  (-1)J  (1  -  *  (- 

fT  j-0 


fzz 


•)) 
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In  addition,  they  derived  the  c.d.f.  of  for  finite  values  of  n  ,  and  provided 

a  table  in  which  these  distributions  are  presented  for  n-10,20,50  and  «■> 
(asymptotic) . 

In  addition.  Sen  and  Srivastava  proposed  test  statistics  which  are  based  on  the 
likelihood  ratio  test.  More  specifically,  for  testing  Hq  versus  ,  with 

6  >  0  ,  when  p^  is  unknown,  the  likelihood  function,  when  the  shift  is  at  a 
point  t  ,  is 


<3.io)  i  qy  -  ■■  *  e*p  w,  :  l  (xrxt)2  +  f  (Xj-y,)2]) 


(2ir) 


i«l 


i-t+1 


It  can  be  easily  shown  that  the  likelihood  ratio  test  statistic  is  then 


(3.11) 


A  =  sup  (X  -X*  )/(-£-  +  — = )JS 
n  lstsn-l  C  n_t  n-t 


Power  comparisons  of  the  Chemoff  and  Zacks  Bayesian  statistic  and  the 

likelihood  ratio  statistic  are  given  for  some  values  of  n  and  point  of 

shift  t  .  These  power  comparisons  are  based  on  simulations,  which  indicate 
that  the  Chernoff- Zacks  Bayesian  statistic  is  generally  more  powerful  than  the 
Sen-Srivastava  likelihood  ratio  statistic  when  t  ~  n/2  .  On  the  other  hand, 
when  x  is  close  to  1  or  to  n  ,  the  likelihood  ratio  test  statistic  is  more 
powerful. 

Bhattacharyya  and  Johnson  [9]  approached  the  testing  problem  in  a  non- 

parametric  fashion.  It  is  assumed  that  the  random  variables  X,  ,X„, . . . ,X  are 

12  n 

independent  and  have  continuous  distributions  F^(i**l, . . . ,n)  .  Two  types  of 


T-465 


problems  are  discussed.  One  in  which  the  initial  distribution,  ,  is  known  and 

is  symmetric  around  the  origin.  The  other  one  is  that  in  which  the  initial 

distribution  is  unknown  and  not  necessarily  symmetric.  The  hypotheses  corresponding 
to  the  shift  problem  when  is  known  is  H^:  Fq  *  . . .  ■  F^  ,  for  some  specified 

Fq  in  Fq  =  {F:F  continuous  and  symmetric  about  0} 
versus 

H  :  F  =  F  =  ...  =  F  >  F  =  F  ,  some  F.  e  F. 

10  1  x  t+1  n  0  0 

x  is  an  unknown  shift  parameter.  F^  >  F^+^  indicates  that  the  random  variables 
after  the  point  of  shift  are  stochastically  greater  than  the  ones  before  it. 

For  the  case  of  known  initial  distribution  Fq(x)  ,  the  test  is  constructed  with 
respect  to  a  translation  alternative  of  the  form  F  ^(x)  =  Fq(x-A)  ,  where 

A  >  0  is  an  unknown  parameter.  The  problem  is  invariant  with  respect  to  the 

i 

group  of  all  transformations  x^  =  gCx^  ,  i=l, . . . ,n  ,  where  g(x)  is  continuous, 

odd  and  strictly  increasing.  The  maximal  invariant  statistic  is  (R^,...Rn)  and 

(J^,...Jn)  ,  where  =  rank  of  |X^|  (i=l,...,n)  ,  and  =  0  if  sgnCX^)  ■  -1  , 

J±  »  1  if  sgn(X±)  =  1  . 

The  average  power  of  a  test  is  thus 

n 

i)KA)  =  l  q.tJ>(A|i-l)  , 

i=l  1 

where  i|i(A|t)  is  the  power  at  A  ,  when  the  shift  occurs  after  t  observations, 
<,l’’""’qn  are  given  probability  weights  (q^  >  0,  Zqi°l)  .  Bhattacharyya  and 

Johnson  proved  that,  under  some  general  smoothness  conditions  on  the  p.d.f. 
fg(x)  ,  the  form  of  the  invariant  test  statistic,  maximizing  the  derivative  of 
the  average  power  ( A)  at  A«0  ,  is 
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(3.12) 


Tn  "  l  Qi  sgn  (Xi)  E{~f0  <V(Ri))/f0  (V^Ki;)} 
where  £  ...  s  is  an  ordered  statistic  of  n  i.i.d.  random  variables 


having  a  distribution  Fq(x)  ,  and 


£  q.  .  More  specific  formulae  for  the 

J-l  2 


cases  of  double-exponential,  logistics  and  normal  distributions  are  given.  The 
null  hypothesis  is  rejected  for  large  values  of  .  It  is  further  proven 

n 

that,  any  test  of  the  form  T  *  £  Q  sgn  (X  )  U(R.)  ,  where  U  is  a  strictly 

i«l  1  1 

increasing  function,  is  unbiased.  Moreover,  if  the  system  of  weights 

{q  . ;i=l,...,n}  satisfies  the  condition 
n,i 


(3.13)  lim  -i-  l  Q2  =  b2  ,  0<b2<°° 

n  -*■  «  i-1  ’ 

2  r  2  15 

then,  the  distribution  of  T  /  (nb  (  J  tp  (u)  dn))  ,  as  n-»«°  ,  converges  to  the 

n  0 

standard  distribution,  where 

(3.14)  <Ku)  -  -f'Q  (F"1  (lj(u+l))/f0  (f'1  (Jj(u+1))) 

Similar  analysis  is  done  for  the  case  of  unknown  initial  distribution  F^  .  In 

this  case  the  test  statistic  is  a  function  of  the  maximal  invariant  (S,,...,S  )  , 

which  are  the  ranks  of  (X. , . . . ,X  )  .  The  test  statistic  in  this  case  is  of  the 

1  n 

general  form 

(3.15)  T*  =  l  Q  E{-f'(V(Si))/f(V(Si))} 

n  i=l  1 
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I 

I 

t 


In  the  normal  case,  for  example,  with  equal  weights  for  t»2,...,n  and  weight 

*  n 

0  for  t=l  ,  the  test  statistic  is  T  »  ^  (i-l)S.  . 

n  i-1 

* 

Notice  the  similarity  in  structure  between  the  statistic  Tn  and  that  of  Chernoff 
and  Zacks,  Tr  .  The  difference  is  that  the  actual  values  of  are  replaced  by 

their  ranks,  . 

Hawkins  [23]  also  considered  the  normal  case,  with  two  sided  hypothesis,  both 
0q  and  6  unknown.  Like  Sen  and  Srivastava,  he  considered  the  test  statistic 

U  =  max  |T,|  ,  where 

n  lsJksn-l  k 

«-16>  Tk  ■  J -RSIET ' <xrV  •  “-1 . "-1 


The  statistics  T-,...,T  ,  are  normally  distributed,  having  a  correlation 

l  n-l 

function 


(3.17) 


k(n-m) 


,  msk 


Hawkins  provides  recursive  formulae  for  the  exact  determination  of  the  distribution 
of  U  .  Conservative  testing  can  be  made  by  applying  the  Banferroni  inequality 


P{  max  |T  |  >  c}  <,  (n-1)  P{  |T  |  >  c) 
lsksn-1  K  1 


-  2 (n-1)  *  (-c) 

Hence,  a  conservative  a  level  test  of  H^  can  be  based  on  the  critical  level 
^l-a/(2n-2)  ’  w^ere  t^ie  Y-fractile  of  the  standard  normal  distribution. 
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A  numerical  example  is  given  to  compare  the  exact  and  the  Banferroni  approximation 

to  the  critical  values  of  the  test  statistic  U  .  In  an  attempt  to  understand 

n 

the  asymptotic  properties  of  Un  ,  Hawkins  considered  the  behavior  of  the  maximum 

of  a  Gaussian  process  having  the  same  covariance  structure  as  that  of 

The  asymptotic  results  are  still  not  satisfactory. 

Pettitt  [50]  discussed  non-parametric  tests  different  from  those  of 

t  n 

Bhattacharyya  and  Johnson.  He  defined  for  each  t*l,...,n  ,  U  ■  J  £  sgn(X  -X  ) 

,n  i-1  j-t+1  1  J 

and  studied  the  properties  of  the  test  statistic 


(3.18) 

The  distribution  of 


max  |Ufc  _ I 
litSn 


K  was  studied  for  Bernoulli  random  variables, 
n 


4.  Estimating  the  Location  of  the  Shift  Point 

Two  types  of  estimators  of  the  location  of  the  shift  point,  t  ,  appear  in 
the  literature:  Bayesian  and  maximum  likelihood.  El-Sayyad  [17],  Smith  [62], 
Broemeling  [ll],  Zacks  [70;  pp.  311]  and  others,  give  the  general  Bayesian 
framework  for  inference  concerning  the  location  of  the  shift  point,  t  ,  in  an 
AMOC  model. 

Hinkley  [28]  studied  the  maximum  likelihood  estimator.  We  start  with  an 
example  concerning  the  Bayesian  estimation  and  proceed  then  to  present  Hinkley's 
results. 

4.1  Bayesian  Estimation  of  the  Change  Point 

The  Bayesian  procedure  is  to  derive  the  posterior  distribution  of  the  change 
point  t  ,  and  determine  the  estimator  which  minimizes  the  posterior  risk,  for  a 
specified  loss  function. 
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If  the  loss  function  for  estimating  t  by  t  is  L(t,t)  «  |t-t|  ,  then 
the  Bayes  estimator  of  the  change  point  is  the  median  of  the  posterior  distribution 
of  x  ,  given  X^  .  For  example,  suppose- that  X^,...,Xn  are  independent  random 
variables  having  normal  distributions  NCS^.l)  ,  where 


T+l 


...»  9 


90  +  6 


with  0q  known  (0^-0  say).  Furtherfore,  assume  that  the  prior  distribution  of 

2 

6  is  normal,  N(0,o  )  ,  independently  of  t  ,  and  t  has  prior  probabilities 
II(t)  ■  P { x*t }  ,  t**l,...,n  .  Here  {x*n}  indicates  the  event  of  no  change. 

The  posterior  probabilities  of  t  for  this  model  are 


2 

n(t)(l+(n-t)o  )^exp 


(4.1) 


net  v 


n  2  \ 

l  n(j)(l+(n-j)o  )  exp 

J-l 


KM', 

t  2(l+(n-t)p2) .1 


( Ci>2  (n-j>2q2  ] 

(  2  (l+(n-1)o2)( 


— K  1  r 

where  X  _  »  - —  I  X  is  the  average  of  the  last  (n-t)  observations. 

n  C  n  C  i=t+l  1 

The  median  of  the  posterior  distribution  is  then  the  Bayes  estimator  of  t  , 
namely 


(4.2)  *  least  positive  integer  t  ,  such  that 

t 

y  n(i|X  )  s  .5 
i-0  ~n 

In  the  following  table  we  present  the  posterior  probabilities  (4.1)  computed  for 
the  values  of  four  simulated  samples.  Each  sample  consists  of  n»20  normal  variates 
with  means  0^  and  variance  1  .  In  all  cases  9q”0  •  Case  I  consists  of  a 
sample  with  no  change  in  the  mean,  6"0  .  Cases  II-IV  have  a  shift  in  the  mean 
at  T“10  ,  and  6“.5,  1.0  and  2.0  .  Furthermore,  the  prior  probabilities  of  x  are 
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n(t)  -  p(l-p)C  1  for  t«l,...,n-l  and  n(n)  ■  (l-p)n  1  ,  with  p«.01  ;  and  the 


prior  variance  of  6  is  a  m3 


Table  4.1  Posterior  Probabilities  of  {x«t} 


t  6 

0 

.5 

1.0 

2.0 

1 

0.002252 

0.012063 

0.003005 

0.000000 

2 

0.004284 

0.016045 

0.002885 

0.000000 

3 

0.004923 

0.016150 

0.002075 

0.000000 

4 

0.006869 

0.022634 

0.002193 

0.000000 

5 

0.006079 

0.008002 

0.002202 

0.000001 

6 

0.004210 

0.006261 

0.002291 

0.000050 

7 

0.004020 

0.006735 

0.001954 

0.000026 

8 

0.002867 

0.015830 

0.001789 

0.000015 

9 

0.003534 

0.015914 

0.001959 

0.001087 

10 

0.002972 

0.011537 

0.002228 

0.068996 

11 

0.003033 

0.019014 

0.002708 

0.908434 

12 

0.003070 

0.010335 

0.002661 

0.016125 

13 

0.003395 

0.006026 

0.002996 

0.005237 

14 

0.003087 

0.003201 

0.003017 

0.000009 

15 

0.004064 

0.003461 

0.003096 

0.000011 

16 

0.003355 

0.002709 

0.002820 

0.000009 

17 

0.004991 

0.002899 

0.003078 

0.000000 

18 

0.009664 

0.003486 

0.004004 

0.000000 

19 

0.007255 

0.006106 

0.012432 

0.000000 

20 

0.916077 

0.811593 

0.940607 

0.000000 

We  see  in  Table  4.1  that  Bayes  estimator  for  Cases  I-III  is  t«20  (no  change), 
while  in  Case  IV  it  is  t**!!  .  That  is,  if  the  magnitude  of  change  in  the  mean 
is  about  twice  the  standard  deviation  of  the  random  variables,  the  posterior 
distribution  is  expected  to  have  its  median  close  to  the  true  change  point. 

In  many  studies  (for  example,  Smith  [62])  the  Bayesian  model  is  based  on 
the  assumption  of  equal  prior  probabilities  of  {T»t}  .  Such  prior  probabilities 
yield  in  the  above  cases  the  following  posterior  probabilities. 
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Table  4.2.  Posterior  Probabilities  of  {x=t}  ,  for  Equal  Prior  Probabilities 


t  6 

0 

.5 

1.0 

2.0 

1 

0.023329 

0.030693 

0.019734 

0.000001 

2 

0.020209 

0.031206 

0.037996 

0.000006 

3 

0.024060 

0.028078 

0.125330 

0.000035 

4 

0.023996 

0.024921 

0.081694 

0.000149 

5 

0.023063 

0.026290 

0.083705 

0.002859 

6 

0.022546 

0.030888 

0.111434 

0.005653 

7 

0.022951 

0.042321 

0.079959 

0.001071 

8 

0.029850 

0.036347 

0.059293 

0.005238 

9 

0.043298 

0.030515 

0.026376 

0.029615 

10 

0.043976 

0.031933 

0.069415 

0.931462 

11 

0.052939 

0.033107 

0.020594 

0.014332 

12 

0.059540 

0.037187 

0.034396 

0.008651 

13 

0.065588 

0.048819 

0.033543 

0.000431 

14 

0.037356 

0.040960 

0.052289 

0.000457 

15 

0.060050 

0.049399 

0.043785 

0.000037 

16 

0.055957 

0.055566 

0.048865 

0.000004 

17 

0.049753 

0.069433 

0.022328 

0.000000 

18 

0.050994 

0.085113 

0.034621 

0.000000 

19 

0.156117 

0.092993 

0.012691 

0.000000 

20 

0.134429 

0.174230 

0.001955 

0.000000 

As  seen  in  Table  4.2,  the  Bayes  estimator  x  when  6=2  is  exactly  at  the  true 

A 

point  of  change  x=10  .  On  the  other  hand,  when  6=0  the  estimate  is  x=16  . 

Smith  derived  formulae  of  the  Bayes  estimators  for  cases  of  sequences  of 
Bernoulli  trials  [62],  and  for  switching  linear  regression  problems  [63]. 

Bayesian  estimators  for  the  location  of  the  shift  parameter  for  switching 
regression  problems  are  given  also  by  Ferriera  [19],  Holbert  and  Broemeling  [32], 
Tsurumi  [65]  and  others. 

4.2  Maximum  Likelihood  Estimators 

Let  X^,X2»...,Xn  be  a  sequence  of  independent  random  variables.  As  before, 
assume  that 

XrX2 . Xt  ~  Fq(x) 

and 

XT+1 . Xn  ~  Fx(x)  , 
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where  F^(x)  and  F^(x)  are  specified  distributions,  t  is  the  unknown  point 
of  shift.  The  maximum  likelihood  estimator  (MLE)  of  t  is 


(4.3) 


Tn  *  least  positive  integer  t 


t«l,...,n  ,  maximizing  S  ,  where 

n,t 


(4.4) 


l  log  f  (X  )  +  l  log  f  (X.) 
i«l  u  1  i-t+1  x  1 


I  log  f0(x±) 

1»1 


.  if  t— 1 j ... ,n-l 


,  if  t=*n 


fQ(x)  and  f1(x)  are  the  p.d.f.'s  corresponding  to  FQ(x)  and  F^x)  .  We 

A 

present  here  the  method  of  deriving  the  asymptotic  distribution  of  ,  as 
n  and  t  »  ,  following  the  development  of  Hinkley  [28]. 

Let  U±  =  log  fQ(X1)  -  log  fx(X±)  ,  i-1,2 . n 


Since  S 


_  ..  ■  I  U  +  ■  £  log  f.(X.)  ,  it  readily  follows  that  t  is  the  least 

^  ^  ±-±  ^  ^  n 


positive  integer  maximizing  V  ■  £  U,  (t*l,...,n)  .  Consider  the  sequence 

C  i-1  1 

wt  =  V  -  ,  where  x  is  the  true  point  of  shift.  For  very  large  value 

of  t  (t-*»)  consider  the  backward  and  forward  sequences 

k 

W  -  {0,  -u  ,  -u-u  I  u  ,  ...} 

T  T"x  j-0  T_J 


w  ■  { 0 ,  u  ,  . . • >  £  U  .  .  *  ■ ■ • ) 

j-0  2 
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all  the  observations  are  from  F^(x)  .  If  x  =  t  (t-2,3,...)  then  the  first 
t-1  observations  are  from  FQ(x)  and  Xt,XT+1>...  are  from  F^x)  .  Let  fQ(x) 
and  f^x)  be  the  p.d.f.  corresponding  to  Fq(x)  and  F^(x)  »  respectively. 

The  random  variables  X^.X^,...  are  observed  sequentially  and  we  wish  to  apply 
a  stopping  rule  which  will  stop  soon  after  the  shift  occurs,  without  too  many 
"false  alarms".  The  following  objectives  are  considered  in  the  selection  of  a 
stopping  variable  N: 

1)  If  n(x)  denotes  the  prior  distribution  of  x  ,  then  the  prior  risk 

(5.1)  R(JI,N)  -  Pn(N<x)  +  c  Pn(N^x)  E  {N-x|ltex} 

is  minimized,  with  respect  to  all  stopping  rules. 

2)  To  minimize  E^{N-x|NSx}  subject  to  the  constraint  Pn(N<x)  £  a  , 

0<a<l 
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5.1  The  Bayesian  Procedures 

The  shift  index,  t  ,  is  considered  a  rancom  variable,  having  a  prior  p.d.f. 
II ( t )  ,  concentrated  on  the  non-negative  integers.  Shiryaev  [60]  postulated  the 
following  prior  distribution 


,  if  t-0 


(5.2) 


n(t)  = 


V_d-n)p(i-p) 


t-1 


,  if  t»l,2,... 


for  0<H<1  ,  0<p<l  .  (H-f-(l-Il)p)  is  the  prior  probability  that  the  shift  has 

occurred  before  the  first  observation,  and  p  is  the  prior  probability  of  a  shift 
occurring  between  any  two  observations. 

After  observing  X^,...,Xn  ,  the  prior  p.d.f.  IT(t)  is  converted  to  a 
posterior  probability  function  on  {n,n+l,...}  ,  namely, 
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»  t=n 


(5.3) 


nn(t)r. 


(i-nn)  p(i-p) 


t-i 


,  t-n+1,. 


where  is  the  posterior  probability  that  the  shift  took  place  before  the  n-th 

observation.  This  posterior  probability  is  given  by  IIn  -  1-q^  ,  where 


(5.4) 


and 


q  . . Xi-n)  .<i=p)n  ;  f  (x } 

Hn  D  “  O^V 

n  i-1 


d  -  (n+(i-n)P)  n  f.(x.)  + 

i-1  x  1 


(5.5) 


n-1 


„  n 

(l-n)p  l  (l-p)-1  n  fn(x.)  n  f.(x.)  + 

j-i  i-i  0  1  i'-j+i  1  i 


d-n)  (i-P)n  n  f n (x. ) 

i-l  u  1 


n 


Let  R(X1)  -  f1(Xi)/fQ(Xi)  ,  i-l, 2,...  then 


(5.6) 


where 


*n+l 


(i-n)  (l-p) 


n+1 


R(W  lV<1-n>  (1'P)  ]  +  Bn+1 


Bn+1  “  R(Xn+l)  (1-n)  (1'p>n  p  +  (1~n)  (1_p> 


n+1 


But  (1-n)  (l-p)  -  q  D  .  Hence, 

n  n 
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qn  (1-P) 

(5*7)  qn+l  “  R(Xn+1)  (l-qn(l-p))  +  qn  (1-p) 

or 


n  =  0,1,...  with  nQ  *  II  and  qQ  -  1-n  .  Accordingly,  the  sequence  of 

posterior  probabilities  {Il^jnsO}  is  Markovian,  i.e.,  the  conditional  distribution 

of  n  ,,  depends  on  the  first  n  observations  X, , . . . ,X  ,  only  through  n  . 

This  can  lead  immediately  to  the  construction  of  recursive  determination  of  the 

distrubiton  of  any  stopping  variable  depending  only  on  n  (see  Zacks  [71]). 

Shiryaev  [60]  has  shown  that  when  Fq  and  F^  are  known,  the  optimal  stopping 

variable,  with  respect  to  the  above  objectives,  is  to  stop  at  the  smallest  n 
*  * 

for  which  n  2  A  ,  for  some  0  <  A  <  1  . 
n 

Bather  [7]  has  shown  that  for  the  constraint  of  bounding  the  expected  number  of 

*  -1 

false  alarms  by  N  ,  A  «  (N+1)  is  the  optimal  stopping  boundary. 

When  the  distributions  F^  and  F^  are  not  completely  specified,  the 

above  problem  of  finding  optimal  stopping  variables  becomes  much  more  complicated.  . 

Zacks  and  Barzily  [69]  studied  Bayes  procedures  for  detecting  shifts  in  the 

probability  of  success,  0  ,  of  Bernoulli  trials,  when  the  values  0^  ,  before 

the  shift,  and  the  value  0^  after  it,  are  unknown.  The  Bayesian  model  assumed 

that  0q  and  0^  have  a  uniform  prior  distribution  over  the  simplex 

Ueo,0l);  0<eoS9l<1^  and  the  Point  of  shift,  t  ,  has  the  prior  distribution  (5.2.). 

In  this  case,  the  posterior  probability  H  depends  on  the  whole  vector  of 

observations  X. , . . . ,X  ,  and  not  only  on  n  ,  and  X  .  It  is  shown  that  this 

n  n—  x  n 


(5.8) 


(nn  +  (1-yp)  R(xItf1) 

Vi  =  (nn+  (i-nn)P)  R(xrH.1)  +  (i-nn)  d-P) 
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posterior  probability  is  a  function  of  Xn  ■  (X^,...,Xn)  given  by 


(5.9) 


II  (X  )  -  1  -  (1-n)  (1-p)  A  B(T  +  1  ,  n  -  T  +  2)/D  (X  ) 
n  ~n  n  n  n  ~n 


where  B(p,q)  is  the  beta-function; 


-  i  xi 

j  i=i  i 


D  (X  )  -  n  B(T  +  2  ,  n  -  T  +  1)  + 
n  ~n  n  n 


(l-n)p  l  (l-p)J-1  B(T*nJ  +  ,  n  -  j  -  T^nJ  +  1)  . 

n  J  n  J 

(5.10) 

T(n) 

i*0\  1  J  B(Tn  +  1  ,  n  -  Tn  +  2) 


+  (1-n)  (1-p)  A  B(T  +  1  ,  n  -  T  +  2) 

n  n 

Here,  *  T  -T,  (j»0,...,n)  .  The  sequence  (IT  (X  ) ;  n^l}  is  not  Markovian, 

n-j  n  j  n  ~n 

but  is  a  submartingale .  Zacks  and  Barzily  considered  the  problem  of  determining 

the  optimal  stopping  rule  under  the  following  cost  conditions: 

After  each  observation  we  have  the  option  to  stop  observations  and  declare  that  a 
shift  has  occurred.  The  process  is  then  inspected.  If  the  shift  has  not  yet 
occurred  a  penalty  of  1  unit  is  imposed.  If,  on  the  other  hand,  the  shift  has 
already  occurred,  a  penalty  of  C  units  per  delayed  observation  (or  time  unit) 
is  imposed.  It  is  shown  then  that  the  optimal  stopping  variable  is 


(5.11) 


N  •  least  nsl  ,  such  that  n  (X  )sb  (X  ) 

n  ~n  n  ~n 


where  the  stopping  boundary  ^n(Xn)  is  given  implicitly,  as  the  limit  for 
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j  -*■  »  of 


(5.12) 


b(;))(X  )  ■=  min  (n*  - 
n  ~n 


Mnj'1)  (V 

n _ ~n 

C+p 


,  1) 


*  ( 1 ) 

with  II  *  p/(C+p)  and  the  functions  Mvjy(X  )  can  be  determined  recursively, 

n  ~n 

according  to  the  formula 


(5.13) 


M<j)(X  )  =  E{min  (0,  C  n  . , (X  ,X. , ) 
n  ~n  irri  ~n  n+i 


->’(1  '  V.'Xo'Vl"  +  Vi <VW> 'in* 


It  is  very  difficult,  if  not  impossible,  to  determine  these  functions  explicitly, 

for  large  values  of  j  .  The  authors  therefore  considered  a  suboptimal  procedure 

(2) 

based  on  b  (X  )  only.  Numerical  simulations  illustrate  the  performance  of 
n  ~n 

the  suboptimal  procedure. 

5.2  Asymptotically  Minimax  Rules  and  The  CUSUM  Control 

Lorden  [42,43]  considered  the  sequential  detection  procedure  from  a  non- 

Bayesian  point  of  view  and  proved  that  the  well  known  CUSUM  procedures  of  Page 

[47,48,49]  are  asymptotically  minimax. 

Let  X^,X2>...  be  a  sequence  of  independent  random  variables.  The 

distributions  of  X  , ...,X  ..  is  F0(x)  and  that  of  X  ,X  ......  is  Fn (x)  . 

i  m-i  u  m  idti  j. 

The  point  of  shift  m  is  unknown,  Fq(x)  and  F^(x)  are  known.  The  family 

of  probability  measures  is  { P  ;  m-1,2,...}  ,  where  P  (X  )  is  the  joint  p.d.f. 

m  m 

of  X  ■  (X.,...,X  )  ,  in  which  X  is  the  first  random  variable  with  a  c.d.f. 

~n  in  m 

Fx(x)  . 
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It  is  desired  to  devise  a  sequential  procedure  with  a  (possibly)  extended 

stopping  variable,  N  ,  (i.e.,  lim  P  [N>n]  £  d>0  ,  m*0,l,...)  which  minimizes 

n-x= 

the  largest  possible  expectation  of  delayed  action,  and  does  not  lead  to  too  many 

false  alarms.  More  precisely,  if  Pq(X)  denotes  the  c.d.f.  under  the  assumption 

that  all  observations  have  F»(X)  as  a  c.d.f.  :  and  if  E  {.}  denotes  expectation 

u  m 

under  P  (.)  ,  the  objective  is  to  minimize 
m  - 


(5.14) 


E, (N)  ■  sup  ess  sup  E  { (N-m-1) 
1  -  m 

m>l 


|Fm-l> 


subject  to  the  constraint 

(5.15)  Eq{N}  “  Y 


E^{ *^Fm-l^  denotes  the  conditional  expectation  given  the  o-field  generated  by 

(X1’...’Xm-l)  *  It  Is  proven  by  Lorden  [43]  that  an  asymptotically  minimax 
* 

procedure,  as  y  -»«.  ,  is  provided  by  Page's  procedure,  which  is  described  below. 
Let  R(X^)  -  f^(X^) / fg(X^)  ,  i«l,2,...  where  f^(x)  is  the  p.d.f. 

corresponding  to  F±(X)  ,  i=0,l  .  Let 

k  * 

S,  =  £  log  R(X  )  ,  k=l,2,...  and  T  =  S  -  min  S,  .  Then  for  y  -  log  y 

i  n  n  .  k 

i=l  k<n 

(5.16) 

N  *  least  n£l  such  that  T  £y  , 

n  ' 

is  Page's  (extended)  stopping  variable. 

The  statistic  Tr  can  be  computed  recursively  by  the  formula 

(5.17) 

Tn+1  *  (Tn  +  108  R(Xn+l})  » 
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The  above  detection  procedure  can  be  considered  as  a  sequence  of  one-sided  Wald's 

SPRT  with  boundaries  (0,y)  .  Whenever  the  T  statistic  hits  the  lower  boundary, 

n 

0  ,  the  SPRT  is  recycled,  and  all  the  previous  observations  can  be  discarded.  On 

the  other  hand,  for  the  first  time  T^y  the  sampling  process  is  stopped.  The 

repeated  cycles  are  independent  and  identically  distributed.  Thus,  Wald's  theory 

of  SPRT  can  be  used  to  obtain  the  main  results  of  the  present  theory. 

Let  a  and  6  be  the  error  probabilities  in  each  such  independent  cycle  of 

Wald's  SPRT  ;  i.e.,  a  =  Pn[T  >y]  and  3  *  P.CT  -0].  Let  N,  be  the  length 

u  n  in  1 

of  a  cycle.  Accordingly, 


(5.18) 


■  -r  W 


and 

EilN*}  -  tV  W 


*  1 

Set  y  * -  ,  then  the  constraint  (5.15)  is  satisfied,  since  E, {N, }^1  . 

a  1  i 

*  —  * 

Moreover,  Lorden  proved  that  E^{N  }  «  E^N  }  .  Finally,  applying  well  known 
results  on  the  expected  sample  size  in  Wald's  SPRT,  we  obtain 


(5.19) 


E^N*} 


~  -  loS 


,  as  a  -*■  0 


where  I, 


f,(X) 

E1  {log  f„(X) 


}  is  the  Kullback-Leibler  information  for 


discriminating  between  Fq  and  F^  . 

The  right  hand  size  of  (5.19)  was  shown  to  be  the  asymptotically  minimum  expected 
sample  size.  Thus,  Page's  procedure  is  asymptotically  minimax. 
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In  [42]  Lorden  and  Eisenberg  applied  the  theory  presented  here  to  solve  a  problem 

of  life  testing  for  a  reliability  system.  It  is  assumed  that  the  life  length  of 

the  system  is  distributed  exponentially,  with  intensity  (failure-rate)  X  .  At 

an  unknown  time  point,  9  ,  the  failure  rate  shifts  from  X  to  X(l+n)  , 

*  * 

0<n.^n^no<®  .  Approximations  to  the  formulae  of  E.{N  }  and  E  {N  }  are 

i  ^  up 

given,  assuming  that  X  is  known.  By  proper  transformations  of  the  statistics 
the  detection  procedure  can  be  applied  also  to  cases  of  unknown  X  .  It  is 
interesting  to  present  some  of  the  numerical  results  of  this  study.  For  the  case 
of  X=1  and  a=l/o  the  expected  number  of  observations  required  is 


n 

Y 

E0«> 

E  {N} 
0 

.4 

20 

422 

48 

.6 

50 

676 

36 

.9 

40 

342 

20 

Page's  CUSUM  procedure  is  thus  very  conservative,  relative  to  the  Bayes  procedures 
which  detect  the  shifts  fast,  but  have  also  small  Eq{N}  . 
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